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The complexity of XPath query evaluation and XML ty ping 
Georg Gottlob, Christoph Koch, Reinhard Pichler, Luc Segoufin 
March 2005 Journal of the ACM (JACM), volume 52 issue 2 

Publisher: ACM Press 

Full text available: ^ pdf(447.53 KB) Additional Information: full citation , abstract , references , index terms 

We study the complexity of two central XML processing problems. The first is XPath 1.0 
query processing, which has been shown to be in PTIME in previous work. We prove that 
both the data complexity and the query complexity of XPath 1.0 fall into lower (highly 
parallelizable) complexity classes, while the combined complexity is PTIME-hard. 
Subsequently, we study the sources of this hardness and identify a large and practically 
important fragment of XPath 1.0 for which the combined complexity is I 



Keywords: Complexity, DTD, LOGCFL, XML, XPath 



Streaming XML: XPath queries on streaming data 
Feng Peng, Sudarshan S. Chawathe 

June 2003 Proceedings of the 2003 ACM SIGMOD international conference on 
Management of data 

Publisher: ACM Press 

Full text available: fg| odf(433.73 KB) Additional Information: full citation , abstract, references , citings, index 

terms 

We present the design and implementation of the XSQ system for querying streaming 
XML data using XPath 1.0. Using a clean design based on a hierarchical arrangement of 
pushdown transducers augmented with buffers, XSQ supports features such as multiple 
predicates, closures, and aggregation. XSQ not only provides high throughput, but is also 
memory efficient: It buffers only data that must be buffered by any streaming XPath 
processor. We also present an empirical study of the performance character ... 

Efficient filtering of XML documents with XPath expressions 
C.-Y. Chan, P. Felber, M. Garofalakis, R. Rastogi 

December 2002 The VLDB Journal - The International Journal on Very Large Data 

Bases, Volume 11 Issue 4 

Publisher: Springer-Verlag New York, Inc. 

Full text available: ^ pdf(383.34 KB) Additional Information: full citation , abstract , citings , index terms 
The publish/subscribe paradigm is a popular model for allowing publishers (i.e., data 
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generators) to selectively disseminate data to a large number of widely dispersed 
subscribers (i.e., data consumers) who have registered their interest in specific 
information items. Early publish/subscribe systems have typically relied on simple 
subscription mechanisms, such as keyword or "bag of words" matching, or simple 
comparison predicates on attribute values. The emergence of XML as a standar ... 

Keywords: Data dissemination, Document filtering, Index structure, XML, XPath 



Query execution and optimization: On the memory requirements of XPath evaluation | 
over XML streams 

Ziv Bar-Yossef, Marcus Fontoura, Vanja Josifovski 
June 2004 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART 
symposium on Principles of database systems PODS '04 

Publisher: ACM Press 

Full text available: ^ pdf(272.43 KB) Additional Information: full citation , abstract , references 

The important challenge of evaluating XPath queries over XML streams has sparked much 
interest in the past two years, A number of algorithms have been proposed, supporting 
wider fragments of the query language, and exhibiting better performance and memory 
utilization. Nevertheless, all the algorithms known to date use a prohibitively large 
amount of memory for certain types of queries. A natural question then is whether this 
memory bottleneck is inherent or just an artifact of the proposed algor ... 

5 XML parsing and stylesheets: Incremental maintenance for materialized XPath/XSLT | 
views 

Makoto Onizuka, Fong Yee Chan, Ryusuke Michigami, Takashi Honishi 
May 2005 Proceedings of the 14th international conference on World Wide Web 
Publisher: ACM Press 

Full text available: ^pdf(371.45 KB) Additional Information: full citation , abstract , references , index terms 

This paper proposes an incremental maintenance algorithm that efficiently updates the 
materialized XPath/XSLT views defined using XPath expressions in XP (ELV/ ' vars) . The 
algorithm consists of two processes. 1) The dynamic execution flow of an XSLT program is 
stored as an XT (XML Transformation) tree during the full transformation. 2) In response 
to a source XML data update, the impacted portions of the XT-tree are identified and 
maintained by partially re-evaluating the XSLT progra ... 

Keywords: XML, XPath, XSLT, materialized view, view maintenance 



6 Database theory, technology, and applications (DTTA): MTree: an XML XPath graph 
index 

P. Mark Pettovello, Farshad Fotouhi 

April 2006 Proceedings of the 2006 ACM symposium on Applied computing SAC '06 

Publisher: ACM Press 

Full text available: ^ pdf(1 53.27 KB) Additional Information: full citation , abstract , references , index terms 

This paper introduces the MTree index algorithm, a special purpose XML XPath index 
designed to meet the needs of the hierarchical XPath query language. With the increasing 
importance of XML, XPath, and XQuery, several methods have been proposed for creating 
XML structure indexes and many variants using relational technology have been 
proposed. This work proposes a new XML structure index, called MTree, which is designed 
to be optimal for traversing all XPath axes. The primary feature of MTree li ... 

Keywords: XML, XPath, graph, index, threaded paths 
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7 Access control for XML data: Specifying access control policies for XML documents 
with XPath 

Irini Fundulaki, Maarten Marx 

June 2004 Proceedings of the ninth ACM symposium on Access control models and 
technologies 

Publisher: ACM Press 

Full text available: ^B| pdf(1 86.26 KB) Additional Information: full citation , abstract , references , index terms 

Access control for XML documents is a non-trivial topic, as can be witnessed from the 
number of approaches presented in the literature. Trying to compare these, we 
discovered the need for a simple, clearand unambiguous language to state the declarative 
semantics of an access control policy. All current approaches state the semantics in 
natural language, which has none of the above properties. This makes it hard to assess 
whether the proposed algorithms are correct (i.e., really implement the des ... 

Keywords: xml, xml access control, xpath 



8 Query processin g for XML data: Meta-data indexing for XPath location steps 
SungRan Cho, Nick Koudas, Divesh Srivastava 

June 2006 Proceedings of the 2006 ACM SIGMOD international conference on 
Management of data SIGMOD '06 

Publisher: ACM Press 

Full text available: pdf(1 75.99 KB) Additional Information: full citation , abstract , references , index terms 

XML is the de facto standard for data representation and exchange over the Web. Given 
the diversity of the information available in XML, it is very useful to annotate XML data 
with a wide variety of meta-data, such as quality and sensitivity. When querying such 
XML data, say using XPath, it is important to efficiently identify the data that meet 
specified constraints on the meta-data. For example, different users may be satisfied with 
different levels of quality guarantees, or may only have acce ... 

Keywords: XML, hierarchical inheritance, meta-data index 





9 Streaming XML: Stream processing of XPath queries with predicates jjpgjj 

#Ashish Kumar Gupta, Dan Suciu 
June 2003 Proceedings of the 2003 ACM SIGMOD international conference on 
Management of data 

Publisher: ACM Press 

Full text available: <B odf(464.60 KB) Additional Information: full citation , abstract, references , citings, index 
lfiH ^ terms 

We consider the problem of evaluating large numbers of XPath filters, each with many 
predicates, on a stream of XML documents. The solution we propose is to lazily construct 
a single deterministic pushdown automata, called the XPush Machine from the given 
XPath filters. We describe a number of optimization techniques to make the lazy XPush 
machine more efficient, both in terms of space and time. The combination of these 
optimizations results in high, sustained throughput. For example, if ... 

10 Research session 1: querying xml & semistructured data / query lan g uages: XPath Q 
satisfiability in the presence of DTDs 
Michael Benedikt, Wenfei Fan, Floris Geerts 

June 2005 Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART 
symposium on Principles of database systems 
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Publisher: ACM Press 

Full text available: ^ pdf(209.18 KB) Additional Information: full citation , abstract , references 

We study the satisfiability problem associated with XPath in the presence of DTDs. This is 
the problem of determining, given a query p in an XPath fragment and a DTD D, whether 
or not there exists an XML document Tsuch that T conforms to D and the answer of p on 
T is nonempty. We consider a variety of XPath fragments widely used in practice, and 
investigate the impact of different XPath operators on satisfiability analysis. We first study 
the pro... 

11 Research session: XML query processing #1: Rewriting XPath queries usin g 

materialized views 

Wanhong Xu, Z. Meral Ozsoyoglu 

August 2005 Proceedings of the 31st international conference on Very large data 
bases VLDB '05 

Publisher: VLDB Endowment 

Full text available: ^j) pdf(373.63 KB) Additional Information: full citation , abstract , references , index terms 

As a simple XML query language but with enough expressive power, XPath has become 
very popular. To expedite evaluation of XPath queries, we consider the problem of 
rewriting XPath queries using materialized XPath views. This problem is very important 
and arises not only from query optimization in server side but also from semantic caching 
in client side. We consider the problem of deciding whether there exists a rewriting of a 
query using XPath views and the problem of finding minimal rewritings ... 

12 DB-IR-2 (databases and information retieval): web and XML text search: Processing 

content-oriented XPath queries 
Borkur Sigurbjornsson, Jaap Kamps, Maarten de Rijke 

November 2004 Proceedings of the thirteenth ACM international conference on 
Information and knowledge management CIKM '04 

Publisher: ACM Press 

Full text available: ^ pdf(237.89 KB) Additional Information: full citation , abstract , references , index terms 

Document-centric XML collections contain text-rich documents, marked up with XML tags 
that add lightweight semantics to the text. Querying such collections calls for a hybrid 
query language: the text-rich nature of the documents suggests a content-oriented (IR) 
approach, while the mark-up allows users to add structural constraints to their IR queries. 
Hybrid queries tend to be more expressive, which should lead— in principle— to better 
retrieval performance. In practice, the processing of t ... 



Keywords: XML retrieval, XPath, content and structure 



13 XML and semistructured data querying: XPath lookup queries in P2P networks 
^ Angela Bonifati, Ugo Matrangolo, Alfredo Cuzzocrea, Mayank Jain 

v November 2004 Proceedings of the 6th annual ACM international workshop on Web 
information and data management 

Publisher: ACM Press 

Full text available: ^j) pdf(263.77 KB) Additional Information: full citation , abstract , references , index terms 

We address the problem of querying XML data over a P2P network. In P2P networks, the 
allowed kinds of queries are usually exact-match queries over file names. We discuss the 
extensions needed to deal with XML data and XPath queries. A single peer can hold a 
whole document or a partial/complete fragment of the latter. Each XML 
fragment/document is identified by a distinct path expression, which is encoded in a 
distributed hash table. Our framework differs from content-based routing mechanisms, ... 
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14 Research sessions: XML query efficiency: BLAS: an efficient XPath processing 
^ system 

^ Yi Chen, Susan B. Davidson, Yifeng Zheng 

June 2004 Proceedings of the 2004 ACM SIGMOD international conference on 
Management of data 

Publisher: ACM Press 

Full text available: ^ pdf(1 79.44 KB) Additional Information: full citation , abstract , references , citings 

We present BLAS, a Bi-LAbeling based System, for efficiently processing complex XPath 
queries over XML data. BLAS uses P-labeling to process queries involving consecutive 
child axes, and D-labeling to process queries involving descendant axes traversal. The 
XML data is stored in labeled form, and indexed to optimize descendent axis traversals. 
Three algorithms are presented for translating complex XPath queries to SQL expressions, 
and two alternate query engines are provided. Experimental result ... 

15 XML and information integration: XPath query transformation based on XSLT 
^ stylesheets 

^ Sven Groppe, Stefan Bottcher 

November 2003 Proceedings of the 5th ACM international workshop on Web 
information and data management 

Publisher: ACM Press 

Full text available: ^ pdf(230.70 KB) Additional Information: full citation , abstract , references , index terms 

Whenever XML data must be shared by heterogeneous applications, transformations 
between different application-specific XML formats are necessary. The state-of-the-art 
method transforms entire XML documents from one application format into another e.g. 
by using an XSLT stylesheet, so that each application can work locally on its preferred 
format. In our approach, we use an XSLT stylesheet in order to transform a given XPath 
query such that we retrieve and transform only that part of the XML docum ... 

Keywords: XPath, XSLT, query rewriting, query transformation 



16 Poster Session: Processing XPath queries with XML summaries 
Takeharu Eda, Makoto Onizuka, Masashi Yamamuro 

October 2005 Proceedings of the 14th ACM international conference on Information 
and knowledge management CIKM '05 

Publisher: ACM Press 

Full text available: ^ pdf(84.52 KB) Additional Information: full citation , abstract , references , index terms 

Range labeling and structural joins are well-studied techniques for efficiently processing 
XPath queries. However, when XPath queries become long, many times of structural joins 
are required. To solve this problem, we developed a method to reduce the number of 
joins and nodes read from the disk using strong DataGuides. Our method can process 
single paths without any joins and twig patterns with joins amongst branching nodes and 
leaves in queries. Experimental results verified that our approach o ... 

Keywords: DataGuides, XML, XPath, databases, structural joins 



17 XML processing: Conditional XPath. the first order complete XPath dialect 
Maarteh Marx 

June 2004 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART 
symposium on Principles of database systems PODS '04 
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Publisher: ACM Press 

Full text available: ^ pdf(234.56 KB) Additional Information: full citation , abstract , references 

XPath is the W3C — standard node addressing language for XML documents. XPath is still 
under development and its technical aspects are intensively studied. What is missing at 
present is a clear characterization of the expressive power of XPath, be it either 
semantical or with reference to some well established existing (logical) formalism. Core 
XPath (the logical core of XPath 1.0 defined by Gottlob et al.) cannot express queries with 
conditional paths as exemplified by "do a child step, while ... 

18 Document querying and transformation: XPath on left and right sides of rules: toward | 
compact XML tree rewriting through node patterns 
Jean-Yves Vion-Dury 

November 2003 Proceedings of the 2003 ACM symposium on Document engineering 
Publisher: ACM Press 

Full text available: pdf(224.44 KB) Additional Information: full citation , abstract , references , index terms 

XPath [3, 5] is a powerful and quite successful language able to perform complex node 
selection in trees through compact specifications. As such, it plays a growing role in many 
areas ranging from schema specifications, designation and transformation languages to 
XML query languages. Moreover, researchers have proposed elegant and tractable formal 
semantics [8, 9, 10, 14], fostering various works on mathematical properties and 
theoretical tools [10, 13, 12, 14]. We propose here a novel way to con ... 

19 Research sessions: path indexi n g: Accelerating XPath location steps j 
Torsten Grust 

June 2002 Proceedings of the 2002 ACM SIGMOD international conference on 
Management of data SIGMOD '02 

Publisher: ACM Press 

Full text available* IS Ddfd 12 MB) Additional Information: full citation , abstract , references , citings , index 
"IS P terms 

This work is a proposal for a database index structure that has been specifically designed 
to support the evaluation of XPath queries. As such, the index is capable to support all 
XPath axes (including ancestor, following, preceding-sibling, descendant-or-self, etc.). 
This feature lets the index stand out among related work on XML indexing structures 
which had a focus on regular path expressions (which correspond to the XPath axes 
children and descendant-or-self plus name tests). I ... 

20 Paper session IR-1 (information retrieval): XML retrieval: Structured queries in XML | 
retrieval 

Jaap Kamps, Maarten Marx, Maarten de Rijke, Borkur Sigurbjornsson 
October 2005 Proceedings of the 14th ACM international conference on Information 
and knowledge management CIKM '05 

Publisher: ACM Press 

Full text available: ^ pdf(260.03 KB ) Additional Information: full citation , abstract , references , index terms 

Document-centric XML is a mixture of text and structure. With the increased availability of 
document-centric XML content comes a need for query facilities in which both structural 
constraints and constraints on the content of the documents can be expressed. How does 
the expressiveness of languages for querying XML documents help users to express their 
information needs? We address this question from both an experimental and a theoretical 
point of view. Our experimental analysis compares a struct ... 

Keywords: XML retrieval, XPath, full-text XML querying 
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